2.2 Sufficiency

#SufficientStatistic #MinimalSufficient #FactorizationTheorem #ExponentialFamily

1 Definition

Sufficiency is a central concept in that it allows us to focus on the essential aspects of dataset while ignoring irrelevant details.

Statistic, Sufficiency

A statistic $T (X)$ is any function of data $X$ (not including parameter).
A statistic $T (X)$ is sufficient (for model $P$ ) if the conditional distribution of $X | T (X)$ is the same for all $P \in P$ , i.e. independent of $θ$ .

In short, sufficient statistics carry all information about $θ$ .

Example

Back to the coin flipping example. For the three assumptions, we denote respectively $\begin{aligned} X_{+ +} \sim Binom (n, θ), \\ X_{i +} \sim Binom (n_{i}, θ_{i}), 1 \leq i \leq 48, \\ X_{i, j} \sim Bernoulli (θ_{i, t}), 1 \leq i \leq 48, 1 \leq j \leq n_{i} . \end{aligned}$
Check for $T (X) = X_{+ +}$ , we have $p_{θ} (x) = \prod_{i = 1}^{48} \prod_{j = 1}^{n_{i}} θ^{x_{i}} (1 - θ)^{n_{i} - x_{i}} = θ^{X_{+ +}} (1 - θ)^{n - X_{+ +}},$ so $\begin{aligned} P_{θ} (X = x | X_{+ +} = t) = \frac{P_{θ} (X = x, X_{+ +} = t)}{P_{θ} (X_{+ +} = t)} \\ = & \frac{1 {X_{+ +} = t} θ^{t} (1 - θ)^{n - t}}{(\binom{n}{t}) θ^{t} (1 - θ)^{n - t}} = {(\binom{n}{t})}^{- 1} 1 {X_{+ +} = t} . \end{aligned}$
This expression is irrelevant to $θ$ , so $T (X) = X_{+ +}$ is sufficient for model 2.

2 Factorization Theorem

Theorem (Factorization Theorem)

Let $P = {P_{θ} | θ \in Θ}$ be a model with densities $p_{θ} (x)$ with common measure $μ$ . Then $T (X)$ is sufficient iff $\exists g_{θ} (t), h (x) \geq 0$ , with $p_{θ} (x) = g_{θ} (T (x)) \cdot h (x)$ for almost every $x$ under $μ$ .

Proof for Discrete

X

The proof can also be seen in here. For this note, we anly check the discrete case. WLOG assume $μ$ is counting measure.

Assume factorization $p_{θ} (x) = g_{θ} (T (x)) h (x)$ exists. Then we have $\begin{aligned} P_{θ} (X = x | T (X) = t) & = \frac{P_{θ} (X = x, T (X) = t)}{P_{θ} (T (X) = t)} \\ = \frac{g_{θ} (t) h (x) 1 {T (x) = t}}{g_{θ} (t) \sum_{z : T (z) = t} h (z)} \\ = \frac{h (x) 1 {T (x) = t}}{\sum_{z : T (z) = t} h (z)} \end{aligned}$
does not depend on $θ$ .
If $T (X)$ is sufficient, construct $g_{θ} (t) = P_{θ} (T (X) = t), h (x) = P (X = x | T (X) = T (x)),$ here $h (x)$ does not depend on $θ$ . Then $P_{θ} (X = x) = g_{θ} (T (X)) h (x) .$

Example

Exponential family exactly fits the expression. So it is sufficient.
Normal distribution family: $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} N (θ, 1) = \frac{1}{\sqrt{2 π}} e^{- \frac{(x - θ)^{2}}{2}}$ . Then $p_{θ} (x) = \prod_{i = 1}^{n} e^{- \frac{(x_{i} - θ)^{2}}{2}} = \underset{g_{θ} (\sum_{i = 1}^{n} x_{i})}{\underset{⏟}{(e^{θ \sum_{i = 1}^{n} x_{i} - \frac{n θ^{2}}{2}})}} \cdot \underset{h (x)}{\underset{⏟}{2 π^{- \frac{n}{2}} (e^{- \frac{1}{2} \sum_{i = 1}^{n} x_{i}^{2}})}} .$
So $T (X) = \sum_{i = 1}^{n} X_{i}$ .
Poisson family: $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} Poisson (θ)$ . So $p_{θ} (x) = \prod_{i = 1}^{n} \frac{θ^{x_{i}} e^{- θ}}{x_{i}!} = \underset{g_{θ} (\sum_{i = 1}^{n} x_{i})}{\underset{⏟}{θ^{\sum_{i = 1}^{n} x_{i}} e^{- n θ}}} \cdot \underset{h (x)}{\underset{⏟}{{(\prod_{i = 1}^{n} x_{i}!)}^{- 1}}} .$ So $T (X) = \sum_{i = 1}^{n} X_{i}$ .
Uniform location family: $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} Uniform (θ, θ + 1)$ . So $\begin{aligned} p_{θ} (x) & = \prod_{i = 1}^{n} 1 {θ \leq x_{i} \leq θ + 1} \\ = 1 {θ \leq min_{i} X_{i}} \cdot 1 {max_{i} X_{i} \leq θ + 1} . \end{aligned}$ So $T (X) = (min_{i} X_{i}, max_{i} X_{i})$ .

Another example is orde statistics. For $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} P_{θ}$ , and any model $P = {P_{θ}^{n} | θ \in Θ}$ on $X \subset R$ . if $P_{θ}^{n}$ is invariant to permutation of $X = (X_{1}, \dots, X_{n})$ (see exchangeability), then $S (X) = (X_{(1)}, \dots, X_{(n)})$ is sufficient.

3 Minimal Sufficiency

For the example of $N (θ, 1)$ , we showed that $\sum_{i = 1}^{n} X_{i}$ is sufficient. Then $\frac{1}{n} \sum_{i = 1}^{n} X_{i}$ is also sufficient.
Some sufficient statistics represent more significant compressions of data than others. Like $\overset{―}{X}$ can be recovered from $S (X)$ but not other way around.

Proposition

$T (X)$ is sufficient. $T (X) = f (S (X))$ . Then $S (X)$ is sufficient.

Proof

By factorization theorem we have $p_{θ} (x) = g_{θ} (T (x)) h (x) = (g_{θ} \circ f) (S (x)) h (x),$ showing that $S (X)$ is sufficient.

Minimal Sufficient

$T (X)$ is minimal sufficient if

$T (X)$ is sufficient.
$T (X) = f (S (X))$ for any other sufficient $S (X)$ . (almost surely in $P$ )

We say $x, y \in X$ are equivalent (denote as $x \equiv p y$ ) if $\frac{p_{θ} (x)}{p_{θ} (y)}$ does not depend on $θ$ .

For log-likelihood $l (θ, x) = \log p_{θ} (x)$ , this implies $l (θ, x) = l (θ, y) + c_{x y} .$
For sufficient $T (X)$ , if $T (x) = T (y) = t$ , $\frac{p_{θ} (x)}{p_{θ} (y)} = \frac{P_{θ} (X = x, T (X) = t)}{P_{θ} (X = y, T (X) = t)} = \frac{P (X = x | T (X) = t)}{P (X = y | T (X) = t)},$ which does not depend on $θ$ by sufficiency. So we know that the following relation always holds:

T (x) = T (y) \Rightarrow x \equiv p y .

Theorem

$T (X)$ is minimal sufficient if $x \equiv p y ⟺ T (x) = T (y)$ .

Proof

First, show $T (X)$ is sufficient. $\forall x : T (x) = t$ , we have $P_{θ} (X = x | T (x) = t) = \frac{p_{θ} (x)}{\sum_{z : T (z) = t} p_{θ} (z)} = {(\sum_{z : T (z) = t} \frac{p_{θ} (z)}{p_{θ} (x)})}^{- 1}$ does not depend on $θ$ .
Then show $T (X)$ is minimal. Assume another sufficient statistic $S (X)$ . Suppose $S (x) = S (y) = s$ , then $x \equiv p y$ , so by assumption of the theorem $T (x) = T (y)$ . Assume $f (s) = T (x)$ .
For any other $z$ with $S (z) = s$ , we must also have $T (z) = T (x) = f (s)$ . So $T (x) = f (S (x))$ .

Q.E.D.

Example (Laplace Location Family)

$X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} p - θ (x) = \frac{1}{2} e^{- | x - θ |}$ . Then $l (θ; x) = - \sum_{i = 1}^{n} | x - θ | - n \log 2$ . The image is a combination of linear segments. So $x \equiv p y ⟺ (x_{(1)}, \dots, x_{(n)}) = (y_{(1)}, \dots, y_{(n)})$ . So $S (X) = (X_{(1)}, \dots, X_{(n)})$ is minimal sufficient.

3.1 Minimal Form

Minimal Form

Form of $p_{η} (x) = e^{η^{T} T (x) - A (η)} h (x)$ is minimal if $\forall η \in Ξ$ , $T (X)$ satisfies no linear constraints, i.e. there is no nonzero vector $a \in R^{s}$ and $b \in R$ , s.t. $η^{T} a = b, \forall η \in Ξ, or T (X)^{T} a \overset{P - a . s .}{=} b .$

Otherwise we can represent $P$ as an $r -$ dim exponential form for some $r < s$ .

Proposition

If $p_{η}$ is a minimal form, then $T (X)$ is minimal sufficient.

Proof

By theorem, we only need to show $x \equiv p y \Rightarrow T (x) = T (y)$ . By this argument, we need to show $l (\cdot; x) = l (\cdot; y) + c_{x y} \Rightarrow T (x) = T (y)$ .
Now, $l (η; x) - l (η; y) = η^{T} (T (x) - T (y)) .$ If $T (x) - T (y) \neq 0$ , we can always find $η, ξ \in Ξ$ s.t. $η^{T} (T (x) - T (y)) \neq ξ^{T} (T (x) - T (y))$ (this is undesirable, because $c_{x y}$ should be irrelevant of $η$ ), so $T (x) = T (y)$ . Q.E.D.

The converse of this proposition is not true.

3.2 Diagram

For case $s = 2$ , let's consider the following example:
Pasted image 20241206202340.png|400

For $A$ , and $B$ , they are non-linear, so they are minimal (can't linearly transform $η$ to a constant vector).
For $C$ , it is not minimal when $s = 2$ (because we can always take the normal vector $γ_{⊥}$ of $γ$ , then for any point on $C$ , we denote it as $η = η_{0} + θ γ$ , we have $γ_{⊥}^{T} η = γ_{⊥}^{T} (η_{0} + θ γ) = γ_{⊥}^{T} η_{0},$ this contradicts with the definition.)
However, when $s = 1$ , $C$ is minimal. In this case, $e^{η^{T} T (x) - A (η)} h (x) = e^{θ (γ^{T} T (x)) - A (η_{0} + θ γ)} h (x),$ take $γ^{T} T (x)$ as the new $T (x)$ .